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ABSTRACT 



A novel variable-size block multi-resolution motion estima- 
tion (MRME) scheme is presented. The motion estimation 
scheme can be used to estimate motion vectors in subband 
coding, wavelet coding-and other pyramid coding .systems 
for^ video xompression. Jn the MRME scheme, the motion 




t-layerand gradually i 
block size is used to adapt to its level in the pyramid. This 
scheme not only considerably reduces the searching and 
matching time but also provides a meaningful characteriza- 
tion of the intrinsic motion structure. In addition, the vari- 
able-MRME approach avoids the drawback of the constant- 
size MRME in describing small object motion activities. The 
proposed Yariable-block size MRME scheme can be used in 
estimating motion vectors for different video source formats 
and resolutions including video telephone, NTSC/PAL/SE- 
CAM, and HDTV applications. 

8 Claims, 7 Drawing Sheets 
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VARIABLE-BLOCK SIZE 
MULTI-RESOLUTION MOTION 
ESTIMATION SCHEME FOR PYRAMID 
CODING 



FIELD OF THE INVENTION 

The present invention relates to video coding and, more 
particularly, to a motion estimation scheme. 

BACKGROUND OF THE INVENTION 

A video frame may be characterized by a multiresolution 
signal representation. Existing multi-resolution motion esti- 
mation schemes find motion vectors in the original image 
domain using constant block size for block matching, (see, 
for example, K. M. Uz et al. in "Interpolative Multiresolu- 
tion Coding of Advanced Television with Compatible Sub- 
channels " IEEE Transactions on Circuits and Systems for 
Video Technology, Vol.1, No. 1 (March 1991)). In coding 
applications employing this multiresolution representation, 
motion estimation is performed by comparing a region being 
coded in a current frame with regions in a previous frame to 
compute a displacement vector in accordance with a match- 
ing criterion. 

An inherent disadvantage is that the motion estimation is 
unable to detect motion activities for small objects in the 
lower resolution representations. 

SUMMARY OF THE INVENTION ' 

A method of computing motion vectors for a frame in a 
full-motion video sequence comprises the steps of: trans- 
forming said video frame into a multifrequency, multireso- 
> lution domain representation by decomposing said frame 
< into a plurality of subframes each with an associated reso- 
iflution and occupying a respective frequency band, wherein 
p??one of said subframes has the lowest resolution; dividing 
ISach subframe into a set of blocks defining a grid, wherein 
Ehe size of each block is based on the associated resolution 
[of said subframe; calculating a motion vector, relative to a 
[previous frame, for each block of said lowest resolution 
■subframe; calculating motion vectors of each block of said 
Rrther subframes, comprising the steps of scaling a motion 
[vector corresponding to a respective block of said lowest 
|resolution subframe, and calculating a motion vector, rela- 
tive to the previous frame, for said block under motion 
compensation using the scaled motion vector 

BRIEF DESCRIPTION OF THE DRAWINGS 



10 



15 



20 



25 



30 



35 



40 



45 



50 



FIG. 1 is a schematic illustration of an exemplary multi- 
resolution representation of a video frame; 

FIG. 2 is a frequency band distribution of the wavelet 55 
decompositions in FIG. 1; 

FIG. 3 depicts the spatial relationship between blocks in 
a current and prior video frame during a block-matching 
motion search; 

FIG. 4 illustrates exemplary vector displacements pro- 
duced during motion estimation; 

FIG. 5 is a block diagram for l 4-pixel accuracy motion 
search; 

FIGS. 6A-^C graphically depict the S/N ratio of various 
motion estimation algorithms for integer, Yi> and l A pixel 
accuracies; and 
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FIGS. 7A-7C graphically depict the average entropy in a 
luminance component for integer, Vi, and X A pixel accuracy 
of various motion estimation algorithms. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

The method disclosed in this invention estimates motion 
vectors in the pyramid domain after decomposition. Variable 
block size is used to adapt to different levels of the pyramid. 
It also avoids the drawback of constant-size multi-resolution 
motion compensation (MRMC) in describing small object 
motions. 

Compared with the conventional transform coding, the 
pyramid coding technique is more flexible and can be easily 
adapted to the nature of human visual system. It is also free 
from blocking artifacts due to the nature of its global 
decomposition. In video coding, some type of inter-frame 
prediction is often used to remove the inter-frame redun- 
dancy. Motion-compensated prediction has been used as an 
efficient scheme for temporal prediction. In typical pyramid 
representations such as wavelet decomposition, a video 
frame is decomposed into a set of sub-frames with different 
resolutions corresponding to different frequency bands. 

A multiresolution representation of image f(kj) with 
resolution depth M consists of a sequence of sub-images in 
a pyramid structure: 

{S^tw^fipw rw/fij=i,2,3} 

The representation is preferably explained by organizing 
the sub-images into a pyramid structure, shown for exem- 
plary purposes in FIG. 1. This exemplary pyramid structure 
of resolution depth 3 consists of a total of 10 subbands with 
3 subbands at each layer and one low-pass subband at the 
apex. Li general, the sequence of subimages {S^f: m=l, . . 
. , M} represents the approximations of a given video frame 
at multiple resolutions. 

This pyramid structure is useful when there are various ,' 
applications involving the same video source, each requiring j 
a certain video resolution with a distinct quality factor. For i 
example, a Common Intermediate Format (CIF) or a Quarter j 
of CIF (QCIF) would be required for video telephony j 
applications, while the CCIR 601 resolution and its sub- \ 
sampled version would be used in TV broadcasting envi- 
ronments. 

As shown in FIG. 1, conversions among different reso- 
lutions are realized by assuming a simple . relationship 
among different video source resolutions. For example, the 
subimage "S 4 , which is appropriate for video telephony 
applications, is formed by combining subimage S B with the 
wavelet representations from layer 3, namely W 8 \ W 8 2 , and 
W 8 3 . Similarly, the subimage S 2 is produced by combining 
subimage S 4 with the wavelets from layer 2. In this manner, 
signal conversions may be made among subimages with 
varying resolutions. 

In pyramid coding schemes, a video frame is first divided 
into a multiresolution structure. Compared to transform 
coding, the pyramid structure is flexible and can be easily 
adapted to the nature of the human visual system. It is also 
free from blocking artifacts due to the nature of its global 
decomposition. For example, the wavelet transformation 
decomposes a video frame into a set of subbands with 
different resolutions, each corresponding to a different fre- 
quency band. These subimages are then organized into a 
pyramid structure to provide a pictorial representation of the 
signal decomposition. Hie wavelets (subimages) which 
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occupy the same layer of the pyramid represent motion 
information at the same scale but with different frequency 
bandV^ 

/-j^These raultiresolution frames provide a representation of 
motion structure at different scales. The motion activities for 
/a particular sub-frame in different resolutions are hence 
highly correlated since they actually specify the same 
(motion structure at different scales. In the multi-resolution 
motion compensation scheme (MRMC) described hereinbe- 
Ac w, nation vectors in, higher resqlurion;ar^pre4ictedtby/the1 
/mo^'oi^^ ^^S'rite lower resolution* and are refined att 
<Leach„step. r Prior to motion estimation, the video signal 
undergoing such predictive estimation is preferablyjiivided. 
irrto_a plurality of blocks to facilitate block-based motion ^ 5 
Cwtimationjj L ' 

Ih~liccordance with the present invention, an MRMC 
scheme is proposed which recognizes and accommodates 
the characteristics of the human visual sensory apparatus. 
Human vision, for example, is more perceptible to errors in 20 
lower frequency stimuli than those mcurred in higher fre- 
quency stimuli. The MRMC scheme of the present invention 
approximates this feature of human vision by adapting the 
size of each block to its associated scale (resolution). This 
variable block size MRMC scheme not only considerably 25 
reduces the searching and matching time but also provides 
a meaningful characterization of the intrinsic motion struc- 
ture. The variable-size MRMC approach also avoids the 
drawback of a constant-size MRMC in describing small 
object motion activities. Hie MRMC scheme described here 
can also be well adapted to motion-compensated interpola- 
tion. 

Instructive texts on discrete wavelet theory and multi- 
resolution analysis may be found in: 
I. Daubechies, "Orthonormal bases of compactly supported 

wavelets," Comm. Pure Appl. Math., vol. XLI, pp. 

909-996, 1988; 
S. Mallat, "Multifrequency channel decompositions of 

images and wavelet models," IEEE Trans. Acoust Speech 40 

Signal Processing, vol. 17, no. 12, December 1989, pp. 

2091-2110; 

S. Mallat, "A theory for multiresolution signal decomposi- 
tion: The wavelet representations " IEEE Trans. Pattern 
Anal. Machine IntelL, vol. 11, no. 7, July 1989, pp. 
674-693; 

P. Burt, "Multiresolution techniques for image representa- 
tion, analysis, and 'smart* transmission," SPIE Visual 
Communications and Image Processing IV, vol. 1199, 
Philadelphia, Pa., November 1989; 

P. Burt and E. Adelson, "The Laplacian pyramid as a 
compact image code," IEEE Trans. Commun., vol, COM- 
31, pp. 532^-540, April 1983; 

M. Vetterli, "Multidimensional subband coding: Some 
theory and algorithms," Signal Processsing, vol. 6, pp. 
97-112, April 1984; 

J. Woods and S. O'Neil, "Subband coding of images " IEEE 
Trans. Acoust. Speech, Signal Processing, vol. ASSP-34, 
no. 5, pp. 1278-1288, October 1986; 

E. Adelson, S. Simoncelli, and R. Hingorani "Orthogonal 60 
pyramid transforms for image coding," SPIE Visual Com- 
munications and Image Processing n, Boston, Mass., vol. 
845, pp. 50-58, October 1987; 

and E. Adelson, "Orthogonal Pyramid Transforms for Image 
Coding," SPIE Visual Communications and Image Pro- 65 
cessing II, Vol. 845, pp. 50-58 (1987), all incorporated 
herein by reference. 
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General Motion Estimation 

Motion estimation schemes can be divided into two 
categories, namely block-matching and pel-recursive. A 
block matching scheme divides the picture into a number of 
blocks on the assumption that all pels within a block belong 
to the same object and Aus possess. 
fa'colrtrast^ 
mon^n^o!^ 

^In a block : matching motion estimation scheme, ea5H\ 
block in the present frame is matched to a particular block \ 
in the previous frame(s) to find the positional (horizontal- ' 
vertical) displacements of that block. A suitable matching 
criteria might, for example, include the Maximum Cross 
Correlation (MCC), Minimum Mean Squared Error 
(MMSE), or Minimum Absolute Difference (MAD). 

For purposes of illustrating the principle of block-match- 
ing, reference is made to the pictorial representation in FIG. 
3. A current frame i and prior frame i-1 are shown, wherein 
the pixel value at location (x l( yj for current frame i is 
assigned a value lfx lt y^ Hie objective of block matching 
is to find a mapping function between I,(x lf y^ and a 
corresponding pixel value in the previous frame which 
satisfies the appropriate matching criteria. In particular, a 
vector V ( <x, y) is sought which will reconstruct I,(x„ y x ) 
from a pixel value I^^+x, y^y) in prior frame i-1 with 
minimum residual error, wherein x and y denote the trans- 
lation in the horizontal and vertical directions, respectively. 

When viewed as a matching function, the block-matching 
scheme finds a best match for l t (x lt y x ) in the previous frame 
(i-1) which is displaced from the present location of (x lf y^ 
by V,(x,y). 

The full range of x and y over which the matching vector 
V/x,y) is determined is designated as a search area Q. A 
representative block 21 in frame i has the pixel value I,(x lf 
Vj), and corresponds positionally to a block 23 in prior frame 
(i-1). During block-matching, a best match is sought 
between block 23 and another block within a search region 
22 (£2). The bordering region 25 includes the remainder of 
the frame, but is not searched. The matching scheme finds a 
matching block 24 whose vector displacement V(x,y) 
defines the x-y displacement of block 24 from block 23. 

The matching criterion employs MAD, which calculates 
the absolute difference between I,(x,y) and each block 
*i-i( x >y) m me neighborhood area 12 and chooses the block 
producing the rninimum difference value. The residual video 
frame from motion estimation corresponding to this differ- 
ence signal is quantized, coded, and transmitted. 

The performance of the motion estimation scheme is 
affected by a number of factors, including the size of the 
search area CI For example, objects moving at high speeds 
are better tracked as the area increases in size. This feature 
would be necessary in high-speed video sequences fre- 
quently depicted in television scenes, as opposed to slower 
moving objects in videoconferencing applications. By com- 
parison, a small search area can easily track the motion of 
slow-moving objects such as in videoconferencing scenes. 

mcreasing 4 the^ means increasing the 

time required to find tie motion vectors-and-the associated 
overhead. Assurmng.a search^mjikxi^ 
pixel where n is the size of the search area in m6nwiizon^ J 
(and vertical directions, a typical frame size of NxN would 
produce an overall complexity of 0[(Nn) 2 ] using a full 
search algorithm. Consequently, in order to reduce" the\ 
computational burden, of the full search algorithm wSfh| 
searches the entire search area ft, other iuB-oplSmal^c^^ifcr 
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have been proposed. These proposed schemes utilize a 
relatively large search area but reduce the complexity by 
avoiding searching the entire area, thus rendering a trade-off 
between performance and search time. As an example, the 
2D directed search has a complexity of (2+7 log 2 n) and is 5 
an extension of the bubble sort whereas the orthogonal 
search algorithm reduces the number of searches to (l+41ogcf 
2 n). 



Multi-Resolution Motion Compensation for 
Multiple Resolution Video 
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As illustrated in FIG. 1 and discussed supra, a video frame 
to be analyzed is decomposed into a plurality of subimages 
with different resolutions and different frequency bands, and 
organized into a multi-layer pyramid structure. The subim- 
ages occupying the same layer have a common resolution 
(scale), wherein each of the subimages occupies a distinct 
frequency band. The frequency allocation for the subimages 
is shown in FIG. 2. Although the motion activities at 
different layers of the pyramid are different, the subimages 
from different layers are highly correlated since they actu- 
ally characterize the same motion structure but at different 
scales and for different frequency ranges. 

The illustrative video frame of FIG, 1 is decomposed into 
ten subimages each occupying a respective frequency sub- 
band. The subimages are organized into a three-level pyra- 
mid representation with three subbands at each of the first 
two levels and four subbands at the top level in which 
subband S 8 represents the lowest frequency band. This 
subband S 8 contains a major percentage of the total energy 
present in the original frame although it is only Vt* of its 
size. 

As noted above, the principal feature of the present 35 
invention is the incorporation of human visual characteris- 
tics pertaining to motion perception into inter-frame motion 
estimation. In particular, the inter-frame motion estimation 
variably adapts the size of the individual subimage blocks 
based upon the resolution factor of the subimage. These 
variable-size blocks take into account the fact that human 
vision is more perceptible to errors in lower frequencies than 
those incurred in higher bands by giving more weight to the 
lower resolution bands. Human vision also tends to be 
selective in spatial orientation and positioning; accordingly, 
the blocks at higher resolutions are chosen to be larger than 
those at lower resolutions. In addition, errors generated by 
motion estimation at the lowest resolution subbands will be 
propagated and expanded to all subsequent lower layer 
subbands. Therefore, motion activities in higher layers 50 



the blocks is preferably computed in accordance with the 
relationship noted above, this should not : serve as a limita- 
tion of the present invention. Rather, it should be obvious to 
those skilled in the art that other block sizes are possible in 
representing the adaptability of the block size to desired 
characteristics of the human visual apparatus. 

For 1 -pixel accuracy, the variable-size MRMC approach 
requires much fewer computations as compared to its fixed- 
size counterpart since no interpolation is needed as the grid 
refines. In variable-size MRME, an accurate characteriza- 
tion of motion information at the highest layer subband 
produces very low energy in the displaced residual subbands 
and results in much "cleaner" copies of DRS for lower layer 
subbands. In contrast, interpolation is often required to 
obtain similar results when using fixed-size block schemes. 
The variable-block size MRME scheme in accordance with 
the present invention works directly on the frequency 
domain so many problems such as block incontinuity are 
avoided: ^ 

As a general prihciplein multi-resolution-mouonestima- 
tion schemes t -tbe inoti^ 

slowest-resolution sub^alJS'ola the top of the pyramidFThen/ 
emotion vectors in lower layers of the pyramid- are refineii 
/using the motion information obtained in higher laycrs^afteif 
^reconstruction of_uVsubimage^in the^spatialjtomain. The 
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specific steps of such a variable block-size MRME scheme 
are outlined below for FIG. 4 in connection with the 
exemplary video representation of HG. 1. 

As noted above, FIG. 3 depicts motion estimation 
between a block 21 in current frame i (for a certain subim- 
age) and a block in frame i-1 which satisfies, a matching 
criteria, namely block with vector displacement V(x,y). FIG. ' 
4 comprehensively illustrates motion estimation for an 
exemplary block in each subimage of the pyramid video 
representation of FIG. 1. More specifically, the ten grid areas 
designated 41-50 correspond to respective regions in the ten 
subimages Sg^Wg^^tW^^, [Wjijzzixs* respec- 
tively. 

The blocks with shading, such as exemplary block 51, 
correspond to subimage blocks in current frame i, while the 
other block 52 corresponds to a block in previous frame i-1 
satisfying the block matching criterion. Tne dashed-arrow 
vector 53 represents an initial estimate determined originally 
for subimage S 8 . Hie solid-arrow vector 54 is the computed 
refinement vector which, when added to the initial estimate 
vector 53, produces the displacement vector defining the 
positional translation from block 51 in frame i to the 
matching block 52 in frame i-1. 

As an initial step in the motion estimation scheme, the 



motion^ector V 8 (x,y)-for_the highest layerjubband S^is 

_ _ calculated by fmLsearchjAWi m:an~exeinplafv:block- size of 

should be more accurately characterized than those in lower £ ' 2x2 and^searchiareyfiFfcf 9x9. This translates jojnarea.of 

layers. C3tx36 at the originaTs^exffoe.fra^ 

As explained earlier, all ten subbands have a highly As then aDpro^atelyicaled ^eusefl atfin initial estimate) 

correlated motion activity. The present invention imple- ^m h^heSfi^lufi^subban 

ments the block size adaptation by preferably using a 55 --'--^'^^ 
variable-size block of p2 A/ ' m by p2 M ' m for the m* level, 



thereby ensuring that the motion estimator tracks the same 
object/block regardless of the resolution or frequency Dand, 
where p by p is the block size of the highest layer M. 
Variable-size blocks thus appropriately weigh the impor- 
tance of different layers to match the human visual percep- 
tion. This scheme can detect motions for small objects in the 
highest Jeveljrf the pyranud^Disadvantageously^ , conslanT 
block-size MRME approaches: r tend:,to' ignore^u^lnotion 
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vector~is useti|^^ then refinedj for low er 

laye&Vaveletslu^ a smaller 

search zone. In this scheme, a block under predictive motion 
estimation is first vectorially displaced by the scaled vector 
(e.g., a scaled version of the initial bias) in the previous 
frame and the motion algorithm is then implemented to find 
the refinement vector A(x,y). 

For example, in grid 8 corresponding to subimage W a * in 
layer 1 of the pyramid representation of FIG. 1, block 54 was 
determined by the block-matching algorithm to satisfy the 



(activities for small objects in nigher levels of therpyrami'd 65 chosen matching criteria. Vector 53 is a scaled version of the 
because a block* size of pxp actually corresponds to.a.blbck motion vector V e (x,y) serving as an initial estimate for the 
size of p2 M " n xp2 M " m in the m tfl layer, ^though the size of motion vector between block 51 and 52. The extending 
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vector 54 represents the incremental vector A(x,y) between 
the initial estimate 53 and the location of block 52. 
/—Since the scale at every level goes down by 2 in each 
dimension, the motion vectors are scaled up by the same 
factor and used as an initial bias for the predictive search 
scheme, which refines this estimate by using full search with 
a reduced search area. The value of Q' is again 9x9 but note 
that the block size has been increased by a factor of two in 
each^ dimension, thus reducing the effective search area for 
that resolution. 

In~general, the motion vector for frame i at any resolution 
m and band j{j=l, 2,3} is estimated by: 

vWgUy^v^i/ft^hAWOty) (l) 

The initial estimate E[V (m) / j(x,y)] can be found using an 
autoregressive (AR) prediction model given by: 
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(2) 
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where <x pqr is a set of prediction coefficients for each block 
and p,q,n are members of a set 9 defined by: 

9=}p,q.n:p*0}lq,n:n=O t l, .... (Af-m) V q}{p.q,n: ^=0, . . . 3*3). 

The refinement term A(x,y) is obtained using the matching 25 
criteria MAD, for example, and is computed as follows: 
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A-UJ: ITie motion vectors for S 8 are calculated and used 
as an initial bias for all the other frequency bands and 
resolutions. 

A-IV: The motion vectors are calculated for all frequency 
bands at the highest level, i.e., {S 8 , W^^l ,2,3} are 
estimated and used as initial bias for refining the 
motion vectors for all lower levels using the corre- 
sponding band motion vectors. 
Algorithms A-I and A-DI use the simplest model in Equatio 
(1), supra, where all of the prediction coefficients are zero 
except for ot^ which is set to 2 M ' m Thus, the motion vectors 
at the resolution m are given by: 
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(5) 



for {j=l,2,3}. For algorithm A-I, A Cm) (x,y) is set to zero. 
Similar equations apply to algorithms A-II and A-IV. 

The algorithms are implemented in general terms with a 
frame size of NxN, a search area of in in each direction, and 
a desired accuracy of 1/r-pixel. The number of computations 
for each block is (r 2 -l)(2n+l) 2 for r>l. As a result of the 
wavelet decomposition, there are three subbands of size 
(N/2)x(N/2) and (N/4)x(N/4) each, and four subbands of 
size (N/8)x(N/8). Assuming that a block size of p is chosen 
for the lowest resolution (e.g., the highest level of the 
pyramid), the number of blocks in each subband is N 2 /64p 2 . 



& m (x.y) = arg min 



{ XY 



xn ra 
p=r-xn. q=>-Yn 



(4) 



where n^sjhe^ub-sjearch area at the m-th layer. ~ 
/For sub-pixel acjcuracy.^ram^li-l) is first interpolatedlM| 
r times bfitsonfei^^^ 



bilinear mterpoiation.^rae^^ws^ block matcjung^for 
^^pixeralxuracy is shown in n^?s!^ r block"irrframe (i) is 
firstmatched'to its^oltesponding sample points in frame 
(i-1). The matching grid is then shifted by one sample point 
in frame (i-1 ), which corresponds to 1/r-pixel at the original 40 
scale. This increases the search complexity by a factor of 
(r 2 -!). Since the motion vectors for the highest layer are 
scaled up by a factor of 2 M m t interpolation is required at.the 
^next layer if,2^!Vr. is less-than one. Hence,„for Vfe-pixePl 
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Va j^oT^oWq^ ^ 

-Several variations in the implern^mtimiof the predictive 
search algorithm for multi-resolution motion estimation are 
possible. The following algorithms are exemplary imple- 
mentations which are presented for illustrative purposes 
only and should not serve as a limitation of the present 
invention. 

A-I: Motion vectors are calculated for S 8 and no further 
refinement is carried out. The values are scaled accord- 55 
ing to the resolution and used for all the other subbands. 

A-II: The motion vectors are calculated for all four lowest 
resolution frequency bands, i.e , S 8 and {W 8 f i=l,2,3,}. 
These values are appropriately scaled and used as the co 
motion vectors for all the corresponding higher reso- 
lution subbands in the corresponding bands. 
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Thus, the total number of searches for subband S 8 is given 
by N 2 (r 2 -l)(2n+l) 2 /64p 2 . Since the search area is r(2n+l) in 
each direction, the number of bits required to represent one 
component of a motion vector will be less than or equal to 
Pog 2 r(2n-H)], depending upon the entropy- The multi-reso- 
lution approach gains its advantage from the fact that one 
pixel motion at level m translates to 2 M ' m pixels at the 
original scale. 



TESTBED IMPLEMENTATION 

The variable-block size MRME scheme described here- 
inabove was implemented in a testbed including a digital 
video recorder with realtime acquisition/playback of 25 
seconds of video signals. The recorder is interfaced to a host 
machine where the compression algorithms are applied to 
the acquired frames. The reconstructed video segments are 
then transferred back to the video recorder and viewed at full 
frame rate for comparison with the original signal. The 
implementation used a test sequence 'CAR* which is a 
full-motion interlaced color video sequence. 

The implementation testbed permitted a comparative 
evaluation of the compression performance, quality degra- 
dation, and computational efficiency for the various coding 
algorithms A-I through A-IV described above. Tables 1-3 
presented below illustrate the energy distributions of the 
algorithms for different pixel accuracies. FIGS. 6 and 7 
graphically depict the S/N ratio and average entropy, respec- 
tively, for different pixel accuracies. 



10/17/2003, EAST Version: 1.04.0000 



5,477,272 



TABLE 1 

4/43 Energies in a Displaced Residual Wavelet frame for Algorithms A-I through A- IV 

Energy S 8 W,, 1 W a a W B 3 W 4 l W 4 a W 4 3 W a » W 2 2 W 2 3 

Origiaal Signal 49587.23 7361,20 452.91 148.47 1391.86 65,89 18.46 203.53 7.48 3.31 

A-I 336.00 848.63 155.74 193.20 428,40 53.07 37.12 110.65 5.26 1.28 

A-n 336.00 195.5B 44.32 28.34 414.99 44.54 21.62 97.47 431 0.55 

A-m 336.00 186.16 44.56 29.01 138.83 13.58 4.65 39.96 i.02 0.05 

A-IV 336.00 195.58 44.32 28.34 139.71 15.14 7.37 38.61 0,78 0.19 
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Table 1 shows the variance or the energy distribution 
among different subbands for the luminance component of a 
typical frame before and after applying the algorithms A-I 
through A-IV. It can be seen from the first row of the table, 15 
which shows the energy in the original subbands before 
applying any form of motion compensation, that the wavelet 
decomposition compacts most of the energy in S 8 for the 
original video signal. Typically, S 8 contains about 85% of 
the energy in the original signal although its scale is down 20 
by a factor of eight in both dimensions. After motion 
compensation, the energies in all subbands are considerably 
reduced. The reduction is at least by an order of a magnitude 
for the highest layer subbands and more than two orders of 
magnitude for S 8 . TTris significant decrease of energy in the 25 
perceptually most significant subband is a result of the 
accurate motion estimation. This layer is the most important 
layer in terms of visual perception and is appropriately 
weighed in variable block-size multi-resolution motion esti- 
mation schemes. The importance of accurate motion esti- 30 
mation is further emphasized by the results obtained with 
sub-pixel accuracies in estimating the motion vectors at the 
highest level. Tables 2 and 3 below show the energy distri- 
butions of the subbands for V£-and Vi-pixel accuracies, 
respectively. 



TABLE 2 







Energies 


in DRS for 1/2-pixel accuracy at highest layer 






Energy 


s 8 




w, 2 


117 3 «/ 1 XI) 2 xir 3 Tjj 1 
ttj "A "4 "4 2 






A-I 


218.67 


865.94 


196.91 


133.90 45Z05 48.81 20.03 110.74 


5.14 


0.76 


A-n 


218.67 


121.51 


38.87 


20.60 522.38 50.47 25.53 113.80 


5.14 


0,81 


A-m 


218.67 


122.19 


38.92 


19.60 112.47 15.91 4.62 37.88 


0.90 


0.07 


A-rv 


218.67 


121.51 


38.87 


20.60 150.61 15.57 6.49 41.01 


0.88 


0.24 


TABLE 3 






Energies in DRS for 1/4-pixel accuracy at highest layer 






Energy 


s 8 


w t l 


W, a 


m/3 \tt I TJJ 2 W 3 W 1 

Vfg W4 W4 TT4 TT J 


w, 3 


w 2 3 


A-I 


139.09 


823.49 


150.83 


118.02 645.07 47.38 37.48 127.73 


4.55 


1.09 


A-II 


139.09 


85.64 


31.50 


14.32 645.07 47.38 37.48 153.67 


4.69 


1.26 


A-m 


139.09 


84.86 


31.93 


13.77 84.01 8.41 4.12 43.45 


1.00 


0.05 


A-rv 


139.09 


85.64 


31.50 


14.32 84.01 8.41 4.12 44.35 


0.88 


0.24 



The energy distribution among the chrominance compo- ■ 
nents U and V follows a similar pattern to that in the 
Y-component. In most cases, the luminance signal contains 
more than 60% of the total energy of the original signal and 
the U and V-components have less than 20%, respectively. 
In order to appropriately weigh the Y,U and V-components 
according to this distribution, the normalizing factor which 
controls the quantization is set to a higher value for the U 
and V components than that for the luminance signal. Higher 
values result in coarser quantization, thus reducing the 
emphass given to that component. 

FIGS. 6A-6C show the reconstructed Signal-to-Noise 
Ratio of the four algorithms for integer, Vi % and l A pixel 
accuracies. Clearly, algorithm A-IV gives the best perfor- 
mance, closely followed by A-m. The search complexity 
and motion overhead for both alternatives are the same. The 
difference in the reconstructed SNR between the two algo- 
rithms is less than 1 dB for 1 -pixel accuracy and becomes 
negligible as the accuracy is increased. Similar results hold 
true for the other two algorithms which follow each other 
very closely. The SNR of algorithm A-II is not much 
different from that of A-I despite the four fold increase in 
computation and overhead. A 2 dB gain can be achieved by 
using A-HI or A-IV over the other two, but at the expense of 



Although the energy decreases dramatically for S 8 , it may 
even increase for other subbands if the motion vectors are 
not refined as in the case of Algorithm A-I and A-II. 60 
However, even when the motion vectors are refined as in 
Algorithms A-m and A-IV, some anomalies may still arise 
because of the reduced search area ft. As an example the 
energy in W 4 ! using A-IV is 139.71 for integer pixel 
accuracy and 150.61 when V4-pixel accuracy is used. In 65 
general, Algorithms A-m and A-IV produce less energy than 
A-I and A-II regardless of the pixel accuracy. 



added complexity and overhead. It should be noted that 
since the instantaneous SNR highly depends upon the 
amount of motion present in the scene, the curves follow the 
same general pattern for all the algorithms and all the 
accuracies. The SNR for integer pixel accuracy is slightly 
better than the corresponding sub-pixel counterparts. 
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Hie first-order entropies of the ten subbands in the 
luminance signal after quantization are shown in FIGS. 
7A-7C. The average entropy translates directly to the num- 
ber of bits required to encode the whole frame and thus 
contributes to the instantaneous bit rate cf the coder. Algo- 5 
rithms A-IH and A-IV again outperform A-I and A-II with 
the lowest entropy as expected. There is a marked difference 
in the entropy and thus the bit rate offered by these two 
schemes as compared to algorithms A-I and A-II. Although 
the entropy is directly related to the amount of motion in a 10 
particular frame, the increase at the 8" 1 frame, in particular, 
is due to the periodic refreshment frame being transmitted. 
After every eight frames, an intra-frame coded picture is 
transmitted to provide synchronization and eliminate accu- 
mulation of quantization errors. The entropy of the ten 15 
individual subbands for a typical frame using Vfc-pixel accu- 
racy is shown in Table 4 below. 
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scaling said first motion vector corresponding to a 
respective block of said subfrarae with the lowest 
resolution to obtain a scaled motion vector, and 

calculating a second motion vector, relative to the 
previous frame, for each block under motion com- 
pensation using the scaled motion vector as an inital 
bias. 

2. The method as recited in claim 1 wherein the dividing 
step includes the step of: 

computing the size of each block based on the relationship 
{p2"" m xP2 Af ' m } > wherein p is a constant, and M rep- 
resents the total number of different resolutions 
employed in said transforming step. 

3. The method as recited in claim 1 wherein the step of 
calculating a first motion vector includes the steps of: 

defining a search area within the previous frame having a 
like set of blocks; 



TABLE 4 



Entropy 




Entropy (biis/pixel) of DRS for 1/4-pixel accuracy 


w, 3 


Average 


s. 


Wa 1 W, a W 8 S W 4 l W 4 2 W 4 3 Wj 1 W 2 2 


A-I 


2.00 


2.62 1.58 1.31 2.13 0.43 0.33 0.S1 0.05 


0.01 


0.44 


A-II 


2.00 


1.57 0.67 0.51 2.13 0.43 0.33 0.53 0.05 


0.01 


0.40 


A-m 


2.00 


1.58 0.69 0.50 1.03 0.16 0.12 0.23 0.01 


0.00 


0.22 


A-IV 


2.00 


1.57 0.67 0.51 1.03 0.16 0,12 0,25 0.01 


0.00 


0.22 



S 8 has the highest entropy as it contains most of the 3(J 
information. The contribution to the bit rate from the highest 
layer is the most significant despite their smaller sizes as 
compared to the subbands in the other layers. The values of 
entropy depend on the normalizing factor discussed earlier 
and the amount of motion in a particular direction. Wavelets 
in the same layer exhibit different behavior in terms of 35 
energy contents and entropy depending on the motion 
present in the direction to which they are sensitive. Some of 
the subbands show a value of zero, which means that the 
coefficients after normalizing are insignificant and thus 
truncated to zero. This particular subband will not play any 40 
part in the reconstruction. Examples are W 2 3 for Algorithms 
A-m and A-IV. 

While there has been shown and described herein what 
are presently considered the preferred embodiments of the 
invention, it will be obvious to those skilled in the art that 43 
various changes and modifications can be made therein 
without departing from the scope of the invention as defined 
by the appended Claims. 

What is claimed is: 

1. A method of computing motion vectors for a frame in 50 
a full-motion video sequence, comprising the steps of; 

transforming said frame into a multifrequency, multireso- 
lution domain representation by decomposing said 
frame into a plurality of subframes each subframe 
having an associated resolution m and occupying a 55 
respective frequency band, wherein one subframe of 
said plurality of subframes has the lowest resolution; 

dividing each subframe into a set of blocks defining a 
grid, wherein the size of each block is based on the 6Q 
associated resolution of said subframe; 

calculating a first motion vector, relative to a previous 
frame, for each block of said subframe with the lowest 
resolution; 

calculating a second motion vector by motion compen- 65 
sation for each block of each other subframe of said 
plurality of subframes, comprising the steps of 



comparing each block of said subframe with the lowest 
resolution with each block of said search area to find a 
matching block in said previous frame satisfying a 
match criteria; 

wherein said first motion vector represents a positional 
displacement of the block in said subframe with the 
lowest resolution, relative to the matching block in said 
previous frame. 

4. The method as recited in claim 1 wherein the scaling 
step includes the step of: 

scaling said first motion vector proportional to the ratio of 

the resolution of a subframe to the resolution of said 

subframe with the lowest resolution; 
wherein said respective block corresponds to the block in 

said lowest resolution subframe in a like position as the 

block under motion compensation. 

5. The method as recited in claim 4 wherein the scaling 
step includes the step of: 

multiplying said first motion vector from the subframe 
having the lowest resolution by the quantity 2 M ' m , 
wherein M represents the total number of different 
resolutions employed in said ttansforming step. 

6. A compression method for a frame f(t) in a full-motion 
video sequence, comprising the steps of: 

deriving a wavelet representation of said frame defined by 
a plurality of subimages each with a different resolution 
and corresponding to a respective frequency band, 
wherein one of said subimages has the lowest resolu- 
tion; 

organizing each of said subimages into a grid of non- 
overlapping image blocks; 

calculating a motion field for each image block of said 
subimage with the lowest resolution by block-matching 
an image block with each block within a defined search 
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area of the preceding frame in said video sequence until 
an optima] match is found; 

deriving a bias motion vector for each block in each other 
subimage of said plurality of subimages said bias s 
motion vector representing a scaled version of the 
motion field for a respectively corresponding block in 
said subimage with the lowest resolution; and 

calculating a motion field, relative to the preceding frame, 
for each block using the respective bias motion vector. 10 

7. The method as recited in claim 6 wherein said block- 
matching includes the step of: 

executing a prespecified distortion function. 
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8. The method as recited in claim 6 wherein the step of 
deriving a wavelet representation includes the step of: 

wavelet transforming said frame f(t) between a scale of 2 l 
and 2 M into a sequence of subimages characterized by 
{S 2 M f(t), [W 2 M f(f)] m . • - [W> 2 1 fTOl^Aj} 

wherein S 2 M f(t) is a smoothed version of f(t) spanned by 
a scaling function at resolution 2 M , and W'zi f(t) (i=l to 
M, j=l,2,3) is a wavelet transform function correspond- 
ing to an approximation of f(t) at resolution 2*. 
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